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TIME  SERIES  ARMA  MODEL  IDENTIFICATION  BY  ESTIMATING  INFORMATION 


I 

I  Emanuel  Parzen 


V 


Institute  of  Statistics 
Texas  A&M  University 


Statisticians,  economists,  and  system  engineers  are  becoming  aware  that  to  identify 
models  for  time  series  and  dynamic  systems,  information  theoretic  ideas  can  play  a 
valuable  (and  unifying)  role.  Models  for  time  series  Y(t)  can  be  formulated  as 
hypotheses  concerning  the  Information  about  Y(t)  given  various  bases  involving  past, 
current,  and  future  values  of  Y(-)  and  related  time  series  X(*)-  To  determine  sets 
of  variables  that  are  sufficient  to  forecast  Y(t),  and  especially  to  determine  an 
ARMA  model  for  Y(t),  an  approach  is  presented  which  estimates  and  compares  various  ^  ^  , 

information  increments.  >e  'tftSLuss  how  to  non-parametrically  estimate  the  MAt») — 

form  estimators  of  the  many  information  numbers  that 
ARMA  model  for  a  univariate  time  series. 


representation,  and  use 
might  compare  to  identl 


n  appro 
Ify  an  i 


1 .  Information  Measures 


at  i ftp** 


The  Information  approach  to  model  identification 
formulates  a  model  (or  hypothesis  about  the 
probability  law  of  random  variables  or  time 
series)  as  a  hypothesis  that  an  information 
number  is  zero.  Information  measures  for 
random  variables  are  defined  in  terms  of  infor¬ 
mation  measures  for  probability  densities.  The 
latter  can  be  regarded  as  defining  "distances" 
between  probability  measures. 

Let  f(y)  and  g(y)  be  two  probability  densities 
on  a  real  line,  — <y<«.  The  information 
divergence  of  index  a  of  a  (model)  g  from  (a 
true  density)  f  is  defined  for  a  »  1  (index  1) 
by 

^(f-.g)  »n-i°9  f(y)  dy 

and  for  a»0  (but  apl)  by  j 

Ia(';9)  -  l09  £{ffef >  *  d*  • 

Information  divergence  of  index  1  has  a 
preferred  role  because  it  has  an  important 
decomposition 

I,(f;g)  *  H(f;g)  -  H(f ) 
defining 

H(f;g)  ■  r<-log  g(y))f(y)  dy, 

H(f)  *  H(f;f)  -  £{-log  f(y)}  f(y)  dy  . 

We  call  H(f;g)  the  cross-entropy  of  f  and  g, 
and  H(f)  the  entropy  of  f.  Information 
divergence  of  Index  1  is  usually  referred  to 
just  as  information  divergence  I(f;g). 

The  information  I(Y|X)  about  a  continuous 
random  variable  Y  in  a  continuous  random 
variable  X  Is  defined  by 

I(Y|X)  -  I(fy|x;  fy)  -  EXI  (fY|x.x;fY). 


The  entropy  of  Y  and  conditional  entropy  of  Y 
given  X  are  defined  by 

H(Y)  «  H(fy) 

H( Y | X)  =  « ( f Y | x )  -  ExH(fY|x=x)  . 

One  can  establish  a  fundamental  decomposition: 

I ( Y | X)  =  H(Y)  -  H{ Y| X) . 

The  most  fundamental  concept  used  in  identify¬ 
ing  models  by  estimating  information  is 
I(Y | Xj ;  X».  X2),  the  information  about  Y  in  X2 
conditional  on  Xj j  it  is  defined 

(I)  I ( Y | X ! ;X] ,X2)  *  H(fy|Xl )-  H(fY)XifX2) 

*  H(Y|X!)  -  H(Y|Xi,X2j  . 

A  fundamental  formula  to  evaluate  I(Y|X1;X1,X2) 
is 

(II)  I(Y|X1;X],X2)  =  I(Y|Xj,X2)  --I(Y|X,)  . 

When  X  and  Y  are  jointly  normal  random  variables 
fY|X,x(y)  *  normal  distribution  whose 

variance  (which  does  not  depend  on  x)  is  denoted 
:{Y|X).  The  variance  of  Y  is  denoted  z( y).  The 
entropy  and  conditional  entropy  of  Y  are 

H(Y)  ■  \  log  r(Y)  +  j  (1  +  log  2*) 

H(Y|X)  -  \  tog  i(Y|X)  +  \  (1  +  log  2n)  . 

The  information  about  Y  in  X  when  X  and  Y  are 
bivariate  normal,  with  correlation  coefficient 
p,  can  be  expressed 

(III)  I (Y | X)  *  -  \  log  s_1(Y)r( Y|X)-7  log(l-c2). 

When  Y  and  X  are  jointly  multivariate  normal 
random  vector,  let  Z  denote  a  covariance  matrix. 
One  can  show  that 

(IV)  I(YlX)  -  (4)  log  det  £*l(Y)r(Y|X) 

*  (-  y)  sum  log  eigenvalues  r_1(V)z(Y|X). 


•(•search  supported  by  Office  of  Naval  Research  under  contract  no.  N00014-82-MP- A©©|. 
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To  Illustrate  the  Information  approach  to  model 
Identification  (or  determining  relations  between 
random  variables)  consider  the  general  problem 
of  testing  the  hypothesis  H0:  X  and  Y  are 
Independent.  One  could  express  H0  In  any  one  of 
the  following  equivalent  ways: 

H0:  fx  Y(*»*)  *  fx(*)fy(y)  for  a11  x  and  y; 

Mo:  fYjx»x^  “  My*  for  a11  x  and  y; 

HQ:  ^^X,Y*  ^X^Y^  m  ®  • 

H0:  I ( Y |  X )  «  0  . 

The  information  approach  to  testing  H0  Is  to 
form  an  estimator  I(Y|X)  Of  I (Y | X) ,  and  test 
whether  It  Is  significantly  different  from  zero. 
One  can  distinguish  several  types  of  estimators 
of  I ( Y | X ) :  (a)  fully  parametric,  (b)  fully  non- 
parametric;  (c)  functionally  parametric  which 
uses  functional  statistical  inference  smoothing 
techniques  to  estimate  I(Y|X)  [see  Woodfleld 
(1982)]. 

An  example  of  fully  parametric  estimators  arises 
when  one  assumes  X  and  Y  are  bivariate  normal 
with  correlation  coefficient  p.  Given  a  random 
sample  (XlfYi), . . . ,(Xn,Yn)  a  fully  parametric 

estimator  of  1 (Y | X)  Is  the  maximum  likelihood 
estimator 

I(Y I X)  »  -  \  log  (I-;*) 

where  p  is  the  sample  correlation  coefficient. 

2.  Information  and  Memory  Approach  to  Time 
Series  Model  Identification 


memory  (white  noise)  series.  Therefore  a  no 
memory  (white  noise)  time  series  requires  no 
further  modeling,  although  one  may  be  Interested 
In  determining  such  statistical  characteristics 
as  the  mean,  variance,  and  probability 
distribution. 

A  short  memory  time  series  Y(t)  Is  modeled  by  an 
Invertible  filter  which  transforms  It  to  white 

noise: 


Y(t) 


Innovations 
filter  g„ 


YV(t) 


where  Yu(0  is  the  innovation  series,  or  series 
of  infinite  memory  one-step  ahead  prediction 
errors,  defined  by  [using  v  to  connote  "what's 
new"] 


Yv(t)  ■  Y(t)  -  Vu(t)  . 


The  predictor  Yu(t)  is  denoted 

YU(t)  -  E [Y ( t )  | Y ( t- 1 ) ,  Y(t-2),  ...] 


■  (Y|Y_, . Y_n - )(t)  . 

We  use  u  as  the  superscript  for  a  predictor  to 
Indicate  that  It  is  an  averaging  operator. 


The  infinite  memory  mean  square  prediction  error 
is  defined  as  the  normalized  variance 
E[|Yv(t)|2]  ♦  E[|Y(t)|»]  . 

The  appropriateness  of  normalizing  Is  justified 
by  the  formula  for  information: 

I„  ■  -  ^  log  o* 


The  approach  to  time  series  analysis  developed 
by  Parzen  distinguishes  four  general  types  of 
time  series  models: 

1.  No  memory  or  white  noise 

2.  Short  memory  or  stationary 

3.  Long  memory  (or  non-stationary) 

3a. Long  memory:  transform  to  short  memory 
3b. Long  memory:  long  memory  plus  short  memory. 

Memory  type  can  be  defined  in  terms  of  the 
information  numbers  I 


*  I(Y(t) | Y(t-1 ), . . . ,Y(t-m) )  ; 

in  words,  I  is  the  information  about  a  time 
series  Y(t)mat  time  t  in  the  m  most  recent 

values  Y(t-l) . Y(t-m).  Let  Y“  denote  the 

infinite  past  Y(t-l),  Y ( t- 2 ) . As  m  tends 

to  -,  Im  tends  to 

1.*  I ( Y 1 Y~ )  «  l(Y(t) | Y ( t— 1 ) » . . . )  . 

We  define  a  time  series  Y(t),  t«0,+l,...  to  be: 

no  memory  If  I.  *  0 
short  memory  If  0  <  I_  <  - 
long  memory  If  I„  "  " 


if  a  time  series  Y(t),  t»0,+l,...  is  a  zero  mean 
Gaussian  stationary  time  series.  Its  probability 
law  can  be  described  by  the  covariance  function 

R(v)  ■  E[Y(t)Y(t*v)]  , 

and  correlation  function 

p(v)  *  *  Corr  [Y(t),Y(t+v)]  . 

Alternatively  the  probability  law  of  Y(-)  can 
be  described  by  the  spectral  density  function  f 
which  is  defined  by 

f(w)  •  \  e  ^ir^Vup(v),  0<u<1 

V»— 

when  l  |p(v)]<»  .  The  frequency  variable  w 

v»-« 

is  usually  assumed  to  vary  in  the  Interval 
-0.5<w<0.5.  But  only  the  Interval  0<i»<0.5  has 
physical  significance.  We  prefer  the  Interval 
0<u<l  for  mathematical  reasons. 

Perhaps  the  most  Insightful  way  to  model  a 
short  memory  time  series  Is  by  representing  it, 
or  approximating  It,  by  an  ARMA(p.q)  scheme: 

f(t)  ♦  ap(l)Y(t-lK..-*.p(p)Y(t-p) 

■  c(t)  ♦  Bq(l )c(t-l )♦. . .♦Sq(q)c(t-q) 
where  the  polynomials 


The  models  we  build  for  a  time  series  depend  on 
its  memory  type.  A  model  corresponds  to  a 
transformation  of  the  time  series  to  a  no 


8pU)  -  l+»p(l)**...*op{p)  ZP 

»»ql*)  •  1*Nq0)**...*Bq(q)  2q 
are  chosen  so  that  all  their  roots  In  the  com* 
plex  z-plane  are  In  the  region  (z:jz|>l)  outside 
the  unit  circle.  Then  9p(z)  and  hq(z)  are  the 

transfer  functions  of  Invertible  filters.  *(t) 
Is  assumed  to  be  a  white  noise  time  series  which 
we  Identify  with  the  Innovations  c(t)  *  Yv(t); 

02>q  ■  E[e2(t)]  .  E[Y2(t)] 

Is  an  estimator  of  o£.  The  spectral  density  of 
an  ARMA  (p,q)  scheme  Is 


fp,q(“>  “p.q  |gp(eZi.1«)|2 

The  process  of  identifying  ARMA (p.q)  schemes 
which  are  adequate  (and  parsimonious)  approxi¬ 
mating  models  for  a  time  series  can  be  studied 
by  determining  information  characterizations  of 
when  the  exact  (or  true)  model  is  an  AR(p)  or 
ARMA(p.q). 

Let  £(Y|Y_1....,Y_p,  Y^ . YM  denote  the 

mean  square  prediction  error  of  Y(t)  when  pre¬ 
dicted  by  Y(t-l) . Y ( t-p) »  Yv(t-1 ) , . . . ,Yv(t-q) , 

or  equivalently  the  conditional  variance  of 

Y(t)  given  Y(t-l) . Y(t-p).  Y^t-l) . 

Yv(t-q).  Normalize  It  to  form 

®ptq  •£“1(Y)l(Y|Y_j,...,Y_p,  Y^.-.-.Y^) 

Jp.q  "  *  i  109  °P.q  * 

The  Information  difference  between  Y  Y  , 

YV,,...,Y?_  and  Y"  for  prediction  of" 1  Y(t)  "p 
satisfies  v 

I(Y|Y_i . Y_p,Y^ ....  ,Y]|q;  Y")*I^  -  Ipiq  . 

The  following  two  hypotheses  are  equivalent: 

Ho:  Y(-)  Is  ARMA(p.q) 

H:  I  -  I  -  0  . 

o  -  p.q 

3.  Information  Calculation  for  ARMA  Schemes 

Given  a  sample  (Y(t),  t«l,2,...T},  we  would 
like  to  estimate,  for  many  values  of  p.q,  the 
Information  differences  (assuming  normality) 

*-  -  Ip.q  '  *  7  1o9  -  {-  7  1o9  °p,q>  * 

He  need  to  estimate  o2  and  o2>q.  To  understand 

the  method  we  would  like  to  propose,  let  us 
first  discuss  how  to  compute  the  true  value  of 
8p  q’  Th®  HA(»),  or  Infinite  order  moving 

average,  representation  of  Y(t)  will  play  a 
central  role: 

Y(t)  •  Yv(t)  ♦  BjYv(t-l)  ♦  8jYv(t-2)  ♦  ...  . 

Note  that  EC|Y(t)|2]  •  E[|Yv(t)|2]  (Hb?+...) 
so  that 


|hQ(e2w1“)|2 


-  \  log  o2  -  (-  i  log  o2>q)  . 
ate  o2  and  o2  _.  To  understand 


1  ■  o2  (1  ♦  bJ+bJ  ♦  ...  ) 

The  correlations  p(v)  can  be  computed  by 
p(v)  -  o2  (Bv  +  B!  By+1  ♦...  )  . 

By  using  matrix  sweep  operations  on  the  joint 

covariance  matrix  of  Y.Y  . . Y  Yv, . Yu 

- 1  - p  - 1  -q 

one  can  determine  (In  a  stepwise  manner)  the 
conditional  variance  E(Y ( Y_-j .... ,Y  ,  Y^,..., 

Y°  1  required  to  compute  the  information  I 

-q  P.q 

We  Illustrate  the  approach  being  proposed  in  the 
case  p*l,  q*l.  The  covariance  matrix  of  Y, 

Y-r  Y*i 

r  i  p(d 


Sweep  Z  on  Y _1  to  obtain 
T  l'P20)  p(1) 


Zj»  -p(l) 


°i  0i-p(i)) 
-o2 

O2(l-02) 


L  o2(Bi-p(1))  a2  o2(l-oi) 
Sweep  z  on  Y^  to  obtain 

P  l~o2sf  p (1 )-o2Bi  6) 


£2b  p(1)-o2Bi  l-o2 


Sweep  Z \  on  Y^  or  sweep  z2  on  Y  ^  to  obtain  a 
matrix  which  we  write  in  the  following  form: 

P  (Bi-p(1))2o2  p(1)-o28j  Bj-p(1  )  "J 

O-p^D) - - - -  - -  - — — 


p(l)-o2Bj 


1-0* 

l-o2 

Bi-pO) 

-1 

1 

l-o2 

1 

V 

i 

V 

We  conclude  that 

E(Y| Y_i )  •  1-p2(1).  ( Y | Y_ i )(t)  -  p(l)  Y_i ( t ) 
Z(Y|Y«1)  ■  1  -  o2Sj  (Y|Y^i )(t)  -  BjY^ft) 

(B,-p(l))V 

£(Y|Y  ,,YM  -  (l-p2(l))  - - - * 

11  l-o2 


4 


(Y!Y.rY?l)(t) 


oO)-o*8i 

Tv 

8i*o(1 ) 

♦  - - 

1-o? 


These  coefficients  of  Y_1  (t)  and  Y^(t)  can  be 

used  as  Initial  (or  perhaps  even  final)  values 
for  an  efficient  parameter  estimation  algorithm 
for  an  ARMA(l.l). 

As  a  check  on  these  formulas,  note  that  for  an  , 
MA(1 ) ,  Y(t)  «  e(t)+b  c(t-l),  81*  b,  o2  =(l+b2)  , 
p(l)  *  b/(l+b2).  The  coefficients  of  Y _1(t)  and 
Yu1(t)  In  the  predictor  are  respectively  0  and  b. 

For  a  numerical  Illustration  of  these  formulas, 
consider  the  ARMA(l.l)  model  Y(t)-aY(t-1) 

*  Yv(t)+bYv(t-l ).  Then  *  a+b.  l*o2(l+(e?/ 

(1-a2))).  p(1)  *  [8i+(B?a/(l-a2))]o2.  For 
a»b*0.5,  8i  *  1.  o2  »  3/7,  p(l)  *  5/7.  The 
general  formulas  yield  the  values  assumed  In 
the  model . 

To  test  whether  a  time  series  Y(-)  obeys  an 
ARMA(l.l).  form 

!  i  zm  (Bi-p(I)}2 

KY|Y_rY  V>4  -  -7^ - } 

This  information  number  equals  0  If  the  time 
series  obeys  any  one  of  the  schemes  AR(1), 

MA(1 ),  or  ARMA (1,1).  The  Information  numbers 
for  an  AR(1)  and  MA(1)  are  respectively 


I(Y| Y_i lY~)  *  7  1°9 
I(Y|Y^;Y*)  -  \  log  0—  -  8?) 


One  accepts  H0:  Y(.)  is  ARMA(l.l)  if  the  last 
two  information  numbers  are  different  from  zero, 
but  I(Y|Y_1 ,  Y^;  Y')  -  0-. 

For  the  ARMA(l.l)  model  Y(t)  -  0.5  Y(t-l) 

.  Yv( t )  ♦  0.5  Yv(t-1), 

I ( Y | Y_ , ;  Y')  *  j  log  7  »  .067 
I(Y|Y^;  Y')  »  \  log  7  ■  .143 

When  information  Ip  ^  Is  estimated  from  a 

sample  of  size  T,  a’penalty  term  (l+p+q)/T  Is 
subtracted  from  the  estimated  Information  I 

in  the  Akalke  information  approach.  If  .067 
were  an  estimated  value  of  l(Y|Y_j;  Y-)  It 

would  be  regarded  as  significantly  different 
from  zero  If  .067  -  (2/T)  »  0,  which  Is  true 
for  T  >.  30. 

To  Identify  the  best  orders  p,q  of  approximating 
ARMA(p.q)  one  could  use  subset  regression 
techniques  to  steer  the  calculation  of  Ip>q. 


Alternatively  one  could  compute  the  Information 
numbers  of  AR(p),  MA(q),  ARMA(p.q)  for  p.q  •  1, 
...,M  (a  specified  upper  limit).  Subtract  from 
estimated  Information  number  a  penalty  (l+p+q)/T. 
Then  sort  the  array  of  penalized  estimated 
Information  numbers  I  to  determine  the  orders 

(p.q)  of  schemes  with  the  largest  amount  of 
information  (and  which  therefore  minimize 
I  -  I  and  correspond  to  best  approximating 
“  p.q 

ARMA  schemes  by  this  measure  of  divergence 
between  probability  distributions). 

4.  Nonparametric  Estimation  of  MA(«) 
Representation 

An  Information  approach  to  computing  Ip  ^  and 

thus  Identifying  best  fitting  schemes  has  been 
described  which  Is  based  on  estimating  the 
coefficients  of  the  MA(-)  representation.  Two 
possible  methods  for  non-parametrlc  MA(-) 
estimation  are  described  In  this  section:  (1) 
approximating  long  autoregressive  schemes;  (2) 
cepstral  correlations.  The  two  methods  may  be 
used  simultaneously  for  greater  confidence  In 
the  results  obtained.  Both  methods  require 
further  theoretical  Investigation  [compare 
Bhansal 1  (1982)]. 

Denote  the  MA(»)  representation  of  Y(t)  by 
Y(t)  •  b (0)  Yv(t)  +  b(l )  Yv(t-1 )  ♦  ... 

where  b(0)  «  1 .  Denote  the  AR(-)  representation 
by 

a(0)  1(t)  ♦  all)  Y(t-l)  ♦...  -  Y^t) 
where  a(0)  »  1 . 

The  approximating  long  autoregressive  scheme 
estimates  the  AR(-)  representation  of  a  time 
series  Y( • )  by  a  finite  order  AR(p)  scheme 

Y { t )  ♦  ap(  1 ) Y ( t- 1  )♦. . .+ap(P)Y(t-p)  -  c(t) 

whose  order  p  is  determined  by  an  order  deter¬ 
mining  scheme  [such  as  AIC,  due  to  Akalke,  or 
CAT,  due  to  Parzen].  The  generating  functions 

(z)  ■  1  ♦  b(l)z  +  b(2)z2*... 

9„U)  ■  1  *(1  )z  ♦  a(2)z2*... 
gp(z)  -  1  ♦  ap(l )  z+. . .+*p(p)  zP 
satisfy 

g„(z)  h^(z)  -  1. 

One  can  solve  recursively  for  b(j)  using  the 
recursion 

a(0)  b(k)  ♦  a ( 1 )  b(k-1 )♦...♦  a(k)  b(0)  -  0. 

When  g„(z)  Is  approximated  by  gD(z),  one 
replaces  a(k)  by  ap(k);  note  that  ap(k)  «  0 
for  k>p.  The  approximating  autoregressive 
method  of  estimating  the  NA(>)  representation 
often  yields  reasonable  results  In  practice. 
However  It  Is  difficult  to  study  Its  properties 
theoretically. 

The  cepstral  correlation  method  Is  available  for 


1 


5 


short  memory  tine  series;  then  log  f(«)  Is 
integrable,  end  can  be  used  to  compute  l„  using 
the  fundamental  formula  (due  to  Kolmogorov  and 
Sxego) 

log  o*  -  /’  log  f(u)  du 

The  cepstral  correlations  are  defined  by,  for 

v*0,  VI,..., 

•(v)  ■  /q  *  *  wV  log  f(u)  du 

The  name  “cepstral  correlations"  Is  Intended  to 
connote  that  *(v)  Is  the  Fourier  transform  of 
log  f(»).  However  the  sequence  {*(v)}  does  not 
share  an  essential  property  of  the  sequence 
(o(v))  of  correlations;  the  cepstral -correlations 
are  not  non-negative  definite  since  log  f(u)  is 
not  non-negative.  Define 

T(z)  •  l  *(k)  zk  ,  v*(z)  -  l  *(-k)  z'k  . 
k-1  k-1 

Then  ,  . 

f(«)  *  oi  |h  (eZ,1“)|2 

log  f(u)  «  tq  +v(e2*1u)  +  t*(e2,1“) 

A  very  important  relation  [which  goes  back  to 
the  dawn  of  modem  time  series  analysis,  due  to 
Kolmogorov  (1939)]  is 

tijz)  -  exp  v(z). 

One  can  obtain  an  explicit  formula  for  b(k)  in 
terms  of  f(k);  thus  Janacek  (1982)  writes 

b(l)  -  *(1). 

b(2)  -  *(2)  +  *?(l)/2!, 

b(3)  ■  *(3)  +  e(l)  e(2)  ♦  p*(l)/3!  . 

A  more  useful  representation  of  the  formula  for 
b(k)  In  terms  of  e(k)  has  been  given  by 
Pourahmadl  (1982): 

»(n+l)  -  (1-  ^)  *(n+l-j)  b(j). 

He  outline  Pourahmadl 1 s  proof;  differentiate 
with  respect  to  z  the  relation  h.  «  exp  T. 

Obtain  h^  ■  h_  v';  explicitly 

I  nb(n)  zn_1-<  l  b(n)  zn}{  J  n*(n)  zn_1) 
n-1  n-0  n»l 

or 

l  (n+l)b(n+l)zn  •  {  l  b(n)zn) 
n-0  n-0 

(n+l)*(n+l)  z")  . 
n*0 

Therefore  n 

(n+1)  b(n+l)  •  l  (k+1 )*(k+l)  b(n-k) 
k-0 

■  I  b(J)  (n+l-J)  e(n+l-j)  . 
j-0 

Divide  by  n+1  to  obtain  the  desired  conclusion. 


Pourahmadl  (1982)  also  states  a  recursive 
formula  for  computation  of  the  AR («) 
coefficients  a(k)  from  *(k): 

n  « 

a(n+l)  *  -  l  (1-  -4r)  *(n+l-J)a(j)  . 

j-0  n  1 

The  properties  of  cepstral  correlations  can  be 
understood  by  examining  their  values  In  the 
case  of  an  AR(1);  then 

f M  -  oi  |l-pe2,1“|‘2' 
where  |p|<  1.  Then,  for  k  >  1, 

♦<k)  -  -  log  {1-p  e*2,i“)e2*ik“ 

«  1  k 

The  rate  of  decay  of  k*(k),  k-1, 2,...,  Is  a 
measure  of  the  memory  of  the  time  series. 

To  estimate  *(k)  from  a  sample  Y(t),  t-1 . T 

one  could  take  the  logarithm  of  the  sample 
spectral  density  (computed  for  u  -  k/Q.  where 
one  should  choose  Q>2T) 

T  T 

f(u)-  |  I  Y(t)  exp  (2*1ut)|*  *  £  | Y(t) | 2 
t»l  trl 

or  a  smoothed  estimator  f(u)  of  f(u).  Then 


A  convenient  formula  for  f(w)  Is  the  windowed 
perlodogram  of  bandwidth  1/T  defined  by 

f(u)  -  I  k(y-)  p(v)  exp  (2*1  vu) 

|vf<T  T 

where  p(v)  Is  the  sample  correlation  function 
computed  by 

,  Q-l  -  k 

p(v)  “  ^  I  f(jj)  exp  (2*1 vk/Q) 

and  k(t)  Is  a  suitable  kernel  (providing  non¬ 
negative  estimators)  such  as  the  Parzen  window 

k(t)  -  l-6tz  +  6t3  ,  |t|  «  0.5 

•  2  (1  -  |t|)»  ,  0.5~<  |t|  <  1  , 

■  0  ,  1  <  |t|  . 

A  kernel  with  superior  properties  (but  not 
necessarily  non-negative  estimates)  is  the 
spline-equivalent  window  [Parzen  (1958), 
Cogbum  and  Davis  (1974),  Uahba  (1980)] 


where  r  Is  usually  chosen  to  equal  2  or  4. 

An  obvious  moral  of  the  foregoing  formulas  is 
that  modern  time  series  model  identification 
requires  the  scientist  to  Integrate  time  domain 
and  frequency  domain  techniques.  The  cepstral 
correlations  approach  to  ARMA  model 


Identification  also  may  provide  a  unification 
of  ARMA  models  and  the  exponential  spectral 
motels  Introduced  by  Bloomfield  (1973). 

5.  Conclusion 

Given  a  sample  of  time  series,  one  should 
estimate  Its  correlations  »(v)  and  cepstral 
correlation  *(v)  through  Fast  Fourier  trans; 
formation  from  the  sample  spectral  density  f(u) 
and  Its  logarithm  log  f(u). 

Using  the  estimated  correlations,  the  Yule- 
Walker  equations  are  solved  to  estimate 
innovation  variances  ojjj,  m«1,2,...  .  Order 
determining  criteria,  such  as  AIC  and  CAT,  are 
applied  to  this  sequence  to  determine  orders  m 
of  approximating  AR  schemes,  to  determine  the 
memory  type  of  the  time  series  [Parzen  (1982)], 
and  to  form  autoregressive  estimators  of  f(u), 
log  f(w).  and  *(v). 

When  a  time  series  is  classified  as  short  memory 
the  estimated  cepstral  correlations  are  used  to 
form  the  MA(«)  coefficients  b(k).  They  are  used 
to  form  information  numbers  (via  sweep  or  subset 
regression  procedures)  for  determining  best 
fitting  ARMA  schemes,  and  the  corresponding  ARMA 
spectral  density  estimator. 

We  do  not  believe  that  spectral  estimation  is  a 
non-parametric  procedure  to  be  conducted 
independently  of  model  identification.  The 
final  form  of  spectral  estimator  should  be 
based  on  an  identification  of  the  type  (AR,  MA, 
or  ARMA)  of  the  whitening  filter  of  a  short 
memory  time  series. 

Statistical  computing  has  a  vital  role  in  time 
series  analysis  in  two  important  ways:  (1)  to 
rapidly  make  available  to  the  broader  scientific 
community  new  algorithms  for  time  series 
analysis;  (2)  to  make  old  theoretical  ideas  of 
time  series  analysis  practically  useful  and  to 
stimulate  the  Integration  of  old  and  new 
techniques  of  time  series  analysis. 

For  other  aspects  of  the  role  of  entropy  and 
information  measures  in  model  identification, 
see  Akaike  (1977)  and  IFAC  (1982).  For  model¬ 
ing  of  multiple  time  series,  see  Parzen  and 
Newton  (1980),  Newton  (1983),  and  Cooper  and 
Wood  (1982).  A  review  (and  power  study)  of 
some  standard  statistical  procedures  for 
determining  the  orders  p  and  q  of  an  ARMA 
scheme  is  given  by  Clarke  and  Godolphln  (1982). 
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